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The atomistic characterization of the transition state is a fundamental step to im- 
prove the understanding of the folding mechanism and the function of proteins. From 
a computational point of view, the identification of the conformations that build out 
the transition state is particularly cumbersome, mainly because of the large compu- 
tational cost of generating a statistically-sound set of folding trajectories. Here we 
O 

•p^ show that a biasing algorithm, based on the physics of the ratchet-and-pawl, can be 

used to identify efficiently the transition state. The basic idea is that the algorith- 

^ mic ratchet exerts a force on the protein when it is climbing the free-energy barrier, 

^ while it is inactive when it is descending. The transition state can be identified as 

^ the point of the trajectory where the ratchet changes regime. Besides discussing this 

^ ! strategy in general terms, we test it within a protein model whose transition state 

O can be studied independently by plain molecular dynamics simulations. Finally, we 

^ show its power in explicit-solvent simulations, obtaining and characterizing a set of 

> 

•rH transition-state conformations for ACBP and CI2. 

X 
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I. INTRODUCTION 



The transition state of biomolecular processes is particularly important because is the 
main determinant of the associated rate. Unfortunately, being the most unstable state of 
the process, its characterization is difficult. In the case of protein folding, a perturbative 
technique where one measures the effect of amino-acid mutations on folding/unfolding rates, 
has been successful in providing a structural characterization of this evanescent stat^. Al- 
though this procedure is experimentally rather demanding, we have now information about 
the structure of the transition state of tens of proteins. 

With the improvement of the force field^^ that describes the interaction in proteins, it 
becomes more and more interesting the attempt to characterize the folding transition state 
without employing experimental informatiorpH^. Within this context, the determination 
of the transition state implies two challenging problems, namely the generation of folding 
trajectories and the identification of the transition state along each of them. Concerning 
the former, the most straightforward way is simply to perform molecular-dynamics (MD) 
simulations solving the equation of motion of the system. In the case of proteins of realistic 
size and using realistic force fields in explicit solvent, generating a statistically-sound number 
of folding trajectories is not trivial even if one can use the fastest computers availabl^^. 
Smarter techniques, comprising transition path samplin^^, milestonin^ni g^Yid dominant 
reaction pathway^^^^, exploit the fact that only a small subset of all possible trajectories 
is statistically relevant, but these methods are computationally efficient when the total 
number of atoms is not large (typically in implicit-solvent models) . Even if one can generate 
efficiently folding trajectories, the problem of identifying the transition state is still hard. 
The transition state between two (meta) stable states is built out of the set of conformations 
for which the probability of falling down to each of them is 1/2. Consequently, the most 
direct way to identify the transition state is to start several MD simulations from each 
of the conformations selected from a folding trajectory, and to count the fraction of such 
trajectories which meet the native state before meeting the denatured state (or vice versa), 
until this fraction is exactly 1/2^^. This procedure is very time consuming, but is the only 
safe way to identify the transition stat^I2l_ 

Some years ago, Marchi and Ballone introduced the idea of biasing MD simulations to 
generate efficiently trajectories between conformations of a system, using an algorithm based 
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of the physics of the ratchet-and-pawP^. It consists in defining a ratcheting coordinate y and 
dumping the thermal fluctuations along the direction of y opposite to the wished target. The 
algorithm was later used to enhance the thermal unfolding of proteins interacting with an 
implicit-solvent force fielcP^'^. Recently, ratcheted MD simulations were used to repeatedly 
simulate the folding of single-domain proteins in explicit solvenlP^. Using a simplified protein 
model, a Beccara et al^ employed a Onsager-Machlup functional and showed that the 
ratcheted MD algorithm produces trajectories that are overall statistically relevant, thus 
validating the approach of ref.!^. 

In what follows, we will investigate whether it is possible to use ratcheted MD simulations 
to obtain directly and efficiently a good approximation of the conformations which build out 
the transition state of protein folding. The basic idea is that while climbing the main free- 
energy barrier which separates the denatured from the folded state, the ratchet exerts work, 
while descending on the other side it is essentially off. The transition between the two 
regimes marks the transition state. 

Although the whole goal of this work is to develop a method that can be used for realistic 
systems in explicit solvent, we first validate it using a model whose folding trajectories can 
be generated by plain MD and whose transition state can be obtained exactly using the 
committors method of ref.'^. 

II. THE MODEL AND THE SIMULATIONS 

A model which is suitable for developing a computational strategy and validating it 
against transition state obtained with the exact method is a modified all-atom Go model, 
where a non-specific interaction between hydrophobic atoms is added on the top of the 
native-structure. The Go model assures that folding can be simulated repeatedly also with- 
out the ratcheting algorithm, in order to be able to obtain reference trajectories, while the 
hydrophobic interaction makes the energy landscape more roughed, and thus more realistic. 
The Go implementation is that of ref.l^, in which pairs of atoms building native contacts 
interact with a Lennard- Jones potential whose minimum lies at eo = —0.62 (in arbitrary 
units). The hydrophobic potential has also the Lennard- Jones form and acts between side 
chain carbons of ALA, VAL, LEU, ILE, PHE and TRP. The minimum of the potential lies 
at a distance of 0.35 nm, where the depth is ehy = —0.3. This value of ehy has been chosen 
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because it is the lowest which guarantees the folding of the proteins studied within an RMSD 
of 0.3 nm from the experimental native conformation. 

The simulations were carried out with a modifiecP^'^ version of Gromac^, using the 
topologies generated with the SMOG web servei'^. The time step used is 0.002 ps (time 
units are merely nominal). 

The specific heat, calculated with parallel-tempering simulation^, is displayed in Fig. 
[l] and compared with that for a plain Go model. As expected, the two-states character of 
the denaturation transition diminished. However, the stability of the native state increased, 
suggesting that the hydrophobic interaction introduced in the model favors the native con- 
formation, where hydrophobic packing is optimized, more than the denatured state. On the 
basis of this specific-heat plot, we used the trajectory obtained at T = 1 to generate 10 un- 
correlated unfolded conformations to be used as initial states of the folding simulations. The 
folding simulations were carried out at T = 0.91, which is regarded as room temperature. 

From each of the 10 unfolded conformations we carried out 10 simulations at T = 0.91 
for 6 ns each. The average folding time, defined as the time needed to reach a RMSD of 0.4 
nm, is Tf = 1505 ps. 

Similar simulations were carried out using the ratcheting algorithm. The ratchet is im- 
plemented as in ref.'^, that is adding to the molecular potential a ratcheting term 

, , , |(pW-p™W)% p{t)>pUt) 

Vrat{p{t)) = { (1) 



where 



and 



0, p{t) < p^{t), 



P{i) = {y{i) - ytargetf (2) 



Pmit) = min p(r). (3) 

o<r<r 



The ratcheting coordinates y(t) used in the present work are either the distance dcM of the 
contact map of a given protein conformation from the native contact map, or the RMSD (in 
both cases ytarget = 0). The distance dcM, introduced by Bonomi et al.'^^^ is defined as 

1/2 

dcM = lie - C|| = I V (a, - I , (4) 



\\c-c\\ = (f2 (c*^^- - ^^^ A 

\j>i+2 J 



were Cij is the i,j element of a NxN matrix defined as 
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fij fcuti 



(5) 



Tij is the distance between atom i and j and C is the defined on the native state. The 
parameters used in these simulations are p = 6, g = 10, tq = 0.75 nm and Tcut = 1-23 nm. 

Both in the case of plain-MD and ratcheted simulations, the sequence of events along 
the folding trajectories under each set of conditions were studied calculating the matrix 
Mij — 9 {t{i, k) — t{j, k)), where t{i, k) is the time at which the ith contact is stably formed in 
the kth simulation and 9 is the Heaviside's step function. This matrix satisfies Mij + Mji — 1 
and each element Mij assumes the value 1 if the formation of the ith contact precedes the 
formation of the jth, if it follows it, and 1/2 if the two are uncorrelated. The average 
matrix 



where = 100 is the number of trajectories, is interpreted as the probability that the 
formation of the ith contact precedes the formation of the j'th. A quantity related to M^j 
is the probabihty Aj — Yli^j ^ij/i'i^s — 1) that the jth contact is formed after any other 
contact. 

The order of contact formation in two trajectories was compared using the distance 



between the associated matrices, where 5 is the Kronecker symbol. 

III. RATCHETED TRAJECTORIES 

A necessary condition for the ratcheting algorithm to identify the correct transition state 
of folding is to generate statistically-relevant trajectories. Failure of this condition would 
lead to the identification of free-energy saddle points not corresponding to the main tran- 
sition state of the folding process. Ratchet-generated trajectories are not expected to be 
associated - as they are - to a large statistical weight, because the corresponding folding 
time lies in the low-probability initial region of the folding-time distribution. However, as 




(6) 
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suggested in ref.'^ and validated in this Section in the case of two model proteins, ratcheted 
MD simulations can provide the most probable sequence of contact formation if carried out 
in appropriate conditions. In this respect, ratcheted trajectories can be regarded as a coarse 
graining over time of the actual trajectories, in which the time-scale information is lost. 

As a reference we generated 100 trajectories with plain MD simulations. The average 
folding time was 1505 ps and all trajectories reached the native conformations in the 10000 
ps made available for each of them. The mean distance d between each pair of matrix 
Mij (cf. Eq. ([7|) is 0.37, indicating that the sequence of events along the different folding 
trajectories are rather homogeneous (cf. ref.'^. Briefly, this sequence implies first the 
formation of most contacts in the two terminal helices, than in the central helices and then 
the tertiary contacts. 

Similar simulations were carried out starting from the same set of unfolded conformations, 
ratcheting the simulation along the distance dcM of the contact map to the native one 
with different values of the ratcheting constant k. Not all the trajectories folded to the 
native conformation, but some of them got stuck, reducing drastically the diffusivity of the 
different parts of the protein and, essentially, freezing to non-native conformations. These 
are excluded from the analysis that follows. The fraction of stuck trajectories, displayed in 
the upper panel of Fig. [2| increases with k. The same figure also displays the average folding 
time, which decreases as the effect of the ratchet is increased. The average folding time of 
ratcheted simulations has the only purpose of measuring the computational time needed to 
generate a folding trajectory, and has no physical meaning. The figure indicates that there 
is a range of values of k around unity where simulations generate fast trajectories to the 
native conformations. 

To assess the physical meaning of such trajectories, we compared the order of native- 
contact formation to that of the unbiased simulations. The mean distance d between each 
pair of ratcheted trajectory is around 0.3 for any value of k and for the unbiased trajectories 
(see lower panel in Fig. [2]), indicating that the sequence of events in the ratcheted simulations 
is as homogeneous as that of the MD trajectories. Also the mean distance between the 
matrices associated to ratcheted trajectories and those associated with plain-MD trajectories 
is 0.39 at all the values of k considered. Comparing the average matrices Mjj, one obtains 
that the root mean square error between the matrix Mij generated ratcheting the simulations 
and with plain MD is around 0.3 for all values of k (cf. Fig. [2]). 
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Summing up, the difference between ratcheted and plain-MD trajectories is comparable 
with the (small) differences between pairs of plain-MD trajectories. Even when the ratchet is 
strong, although the fraction of folding trajectories drops drastically, the sequence of events 
in the few folding trajectories results correct. 

A similar analysis has been carried out ratcheting the simulation through a different 
coordinate, that is the RMSD with respect to the native conformation. Usually this is 
regarded as a bad reaction coordinat^^. In fact, attempts to fold small proteins in explicit 
solvent ratcheting along the RMSD coordinate at different values of the ratcheting constant 
have failecP^. The results of such simulations are displayed in Fig. [sj Also in this case 
folding simulations display a sequence of events that is similar to the one generated by 
unbiased simulations. The main difference with the data obtained ratcheting along dcM 
is that in the present case there is not a range of values of k at which ratchet is efficient. 
At small values of k the folding time r/ is essentially identical to that of the unbiased MD 
simulations. Only using values of k larger than 10 one can observe a relevant decrease of Tf, 
but here the fraction of folding sequences has become negligible. 

ACBP is considered to fold according to a hierarchical diffusion-collision modeP^, where 
first elements of secondary structure are formed, then diffuse around until they bind to- 
gether to form native tertiary contacts. This pattern, which is also observed in the present 
simulations, could favor the applicability of the ratchet. To check the generality of the above 
results we have tested it with another case, namely CI2, which is considered the prototype 
of proteins which fold according to a nucleation model, without populating consistently sec- 
ondary structures prior to the transition state. The results are displayed in Fig. |4j Also in 
this case there is a range of k where the ratchet is efficient, that is both the folding time 
and the fraction of stuck trajectories are small. The efficiency is smaller than in the case 
of ACBP, probably because in this case the contact-map distance dcM is not a reaction 
coordinate as good as for a protein folding through a diffusion-collision scenario. Anyway, 
the sequence of events results to agree with that of a plain MD simulation, within the range 
of variability of the latter (which is somewhat small than that of ACBP). 
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IV. IDENTIFICATION OF THE TRANSITION STATE: THE STRATEGY 



The analysis of the time-dependence of the degrees of freedom associated with the ratchet 
can provide some information to locahze the transition state of the system. The basic idea 
is that, as the system chmbs the free energy barrier whose top is the transition state, the 
ratchet is very active and thus Vrat is well above zero. When the system crosses the transition 
state and descends the free-energy barrier, the ratchet is essentially inactive and Vrat small. 
The point of the trajectory where Vrat drops is hypothesized to be the transitions state. 

Before verifying this hypothesis, we attempt to formalize the above idea, in a simple 
scenario where the molecular force can be approximated in an elementary form. Assuming 
that the dynamics of the degrees of freedom x of the system can be described by an over- 
damped dynamics 

^ = - \f{x) - kAp -up + v], (8) 

where 7 is the friction coefficient; 77 is the thermal noise satisfying < ff{t) ■ r/(t') >= 
{2NT'~f)6{t — t')] Up the versor that defines the direction of the ratcheting coordinate p; 
Ap{t) = p — pm is the difference between the value of the ratcheting coordinate and its 
minimum; and Boltzman's constant is set to 1. Let's assume that p is a good reaction 
coordinate, that is it moves according to the slowest time scale of the systempSI^ and that 
the associated diffusion constant is approximately equal to that of the microscopic degrees 
of freedom. Then, the dynamics of p can be written as 

f = ^[/p-^Ap + ^], (9) 

where fp is the effective force which moves the one-dimensional degree of freedom p (i.e., 
minus the gradient of the free energy). By virtue of its definition, pm follows the dynamics 

^ = p{Ap)e{-dp/dt), (10) 

where 5 is a step function that is 1 if its argument is positive and otherwise. Consequently 
the quantity Ap which measures the activity of the ratchet follows 

d{Ap) _ J ^ [/p - kAp + 7]] if Ap > or dp/dt > ^^^^ 



dt 



if Ap = and dp/dt < 0. 
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If the molecular force / pushes the system downhill towards its target state and it is over- 
whelming with respect to the typical diffusive force (i.e., fp <ti —{2T'j/AtY^'^), then Ap is 
approximately zero along the associated part of trajectory. 

A more common scenario is that where the system is running downhill, the diffusive term 
is not negligible, but the molecular force is overwhelming with respect to the ratchet (i.e.. 



fp <^ ~kAp), so that we can neglect the term proportional to k in Eq. (11 ). To make things 
simple, let's focus on a fraction of the trajectory that is short enough that the force fp can 
be approximated as constant. In this case Ap experiences a diffusion biased by a constant 
force, and by a trap at 0. In fact, when Ap = the system can move away only if dp/dt > 



(cf. the condition controlling the first line of Eq. (11 )); thus the exit rate Wexu from the trap 
is proportional to the probability that r] > \fp\ (At/2T7)^/^. This case is analogous to that 
of a massive particle diffusing on a slope with a trap at the bottom. Since the stochastic 
noise tj is normally distributed, w^xit is proportional to erfc(|/p| {At/2T-iY'^). We can assign 
to the trap an effective energy Utrap, so that Wexu is equal to Kramers escape rate, that is 

At ^ 



exp 



Utrap 


= -erfc 


T 


2 



l/p 



2T7 



(12) 



In the neighborhood of 0, Ap will soon populate a distribution given by 

-1 



p{Ap) 



|:erfc 



exp 



(2T7) 

l/p|Ap' 



T 



where e is the (small) length which defines the trap and 



2 ■ erfc 



l/p 



At 

2T^ 



1/2- 



if < Ap < e 
if Ap > e, 

T 



(13) 



\fp\e 



(14) 



The average value of Ap expected in this regime is then 

< Ap >-- 



e->0 



e/2erfc[|/,|(At)/2T7)V2]-i + |/^|T 



T 



(15) 



On the other hand, if the system is climbing the free-energy barrier (i.e. fp ^ 
(2T7/At)i/2)^ ^l^g conditions Ap = and dp/dt < in Eq. (11) are never satisfied si- 
multaneously. Consequently, 

lAp^-fpAp 



p{Ap) 



exp 



T 



(16) 
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where 



Z=(-J exp 



2 1 

P 



2kT 



1 + erf 



giving the average 

(2/7r)i/2Tfcexp 
< Ap >= 2 



2kT 



(2A;T)V2 
+ fp{Tk)^'^ fl + erf 



(17) 



(2fcT)i/2 



(A:3T)i/2 (i + erf 



/p 



/2^2^ (18) 



(2A:T)i/2 

The transition state is the intermediate scenario where fp vanishes. Assuming \fp\ <C 



2T7/At)^''^, one can neglect the molecular force in Eq. (11), obtaining 



p(Ap) 



2/Z 
1 



exp 



2T 



if < Ap < e 
if Ap > e, 



(19) 



where Z = 2 + (7rA;/2T)^/^/e. This is an ideal distribution, because it is unlikely that the 
system spends enough time at the transition state to populate it. However, it can be useful 
to obtain the average Ap which separates the rising from the descending regime. In fact, we 
get 



< Ap >= 



2T 



2T\ 
7rk J 



1/2 



(20) 



4A;e + (27rA;r)i/2 

The behavior of Ap in a typical ratcheted simulation is displayed in the middle panel of 
Fig. [7} Although it is difficult to distinguish a priori where the system is climbing and where 
it is descending the folding free-energy barrier, it is reasonable to argue that in part of the 
trajectory in the range 30 < t < 125 the system is climbing, while in the range 110 < t < 125 
it is descending. The distribution p{Ap) associated with these two parts of the trajectory 
are displayed in Fig. ISlwith solid black and red curves, respectively. The black curve is fitted 



by Eq. (16), the correlation coefficient being 0.958. The red curve displays a sharp peak at 



low values of Ap as predicted by the first line of Eq. (13), allowing to obtain e = 0.3, while 



the remaining part is fitted by the second line of f Eq. (13), with a correlation coefficient of 



0.965. This means that, although the molecular force fp certainly depends on the specific 
point of the trajectory, the system crosses the free-energy barrier experiencing an effective 
force of fp = 0.75 and descend it pushed by an effective force /„ = —1.45. 



The value of < Ap > obtained in Eq. (20 ) can be used to estimate the order of magnitude 



of the threshold to distinguish the regime where the system is climbing the free-energy 
barrier from that in which it is descending, that is the transition state. For example, in the 
simulation we performed with T = 0.91 and /c = 1, we obtain < Ap >= 0.76 (cf. Fig. ItI). 
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V. IDENTIFICATION OF THE TRANSITION STATE: RESULTS 



Before applying the strategy discussed above, the actual TS was identified through a 
commitment analysiJ^ on 10 plain-MD folding trajectories of ACBP. From each of them 
we extracted a variable number (from 5 to 10) of conformations chosen in the region where 
the value of dcM displays a rapid decrease to low values. From each of them we started 
100 plain-MD simulations, calculating the probability pfoid that the simulation reaches the 
native basin (operatively defined from dcM < 19) before reaching the denatured basin 
(operatively defined from dcM > 25). The conformations displaying 0.4 < pfoid < 0.6 are 
defined as TS conformations. The behavior of pjoid with respect to the value of dcM of the 
associated conformation is displayed in Fig. |5] The associated conformations are displayed 
in Fig. [6]^A). They are remarkably native-like, displaying an average RMSD to the native 
conformation of 0.68 ±0.17 nm, and fairly homogeneous, their mutual average RMSD being 
0.85 ±0.17 nm. 

For each trajectory generated with the ratcheting algorithm, we have looked for the TS 
in the region where the RMSD to the native conformation was in the range between 0.2 
nm and 1 nm. The putative TS is the conformation such that the average value of Ap in 



the preceding 8 ps is larger than that predicted by Eq. (20) and in the following 0.8 ps is 
smaller. In this way, we could identify a conformation in 64% of trajectories at /c = 0.1, in 
the 86% of the trajectories at = 1 and in the 49% of trajectories at A; = 20. In no cases 
more than one conformation is identified. 

The structural properties of the conformations identified by the above criteria are sum- 
marized in Fig. [9] The average contact-map distance is comparable to that of the actual TS 
at all values of k. The structural homogeneity of the TS conformations is slightly decreasing 
with the increasing of k, the mutual average RMSD going from 0.85 nm at A; = to 0.61 
nm at A; = 20. The average similarity of the TS conformations obtained from ratcheted 
simulations to the actual TS conformations is within the error bars a associated with the 
intrinsic variability of the TS conformations (the difference between the two averages being 
^ 0.2cr; black error bars in the figure). Also the RMSD to the native conformation displays 
a slight decrease from 0.68 nm at /c = to 0.46 nm at A; = 20. Summing up, at all values of 
k analyzed, ratcheted MD simulations can identify TS conformations which are structurally 
similar to the actual TS conformations, becoming slightly more native-like at increasing k. 
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A representation of the protein in the TS obtained at /c = 1 is displayed in Fig. [6p. 
The main differences between the actual TS and that obtained by ratcheted conformation 
at A; = 1 involves the terminals of the protein. The actual TS displays large fluctuation 
in the C-terminal part of the chain and, to a smaller extent, in the N-terminal and in the 
loop region. The ratcheted TS overestimates the fluctuations in the C-terminal region, while 
it slightly underestimates those involving the loop. Anyway, the two sets are remarkably 
similar. 



VI. AN EXPLICIT-SOLVENT CASE: THE TRANSITION STATE OF 
ACBP AND CI2 SIMULATED WITH THE AMBER FORCE FIELD 

The very goal of the strategy discussed above is not the identification of the transition 
state with simplified protein descriptions, but in realistic explicit-solvent models. In order 
to test the algorithm we analyzed the folding and unfolding trajectories generated using the 
ratcheting algorithm in ref.'^. Using the AmberOS force fielcP^, we simulated 10 folding and 
10 unfolding trajectories of ACBP and CI2 in a dodecahedron box of 261 nm'^ solvated with 
~ 10^ T1P3P water molecules, ratcheting along dcM with a ratcheting constant k = IkJ/mol 
for 50 ns at T = 300-ft". All trajectories folded within 0.25 nm from the native conformation. 

The transition state is identified with the same strategy used in the Go-model simulations, 
requiring that the average of Ap in the preceding 8 ps is larger than 1 and in the following 8 ps 
is smaller than 2 (this is a somewhat looser condition than for the Go model, but guarantees 
the identification of a unique TS for each trajectory), while the RMSD to the native state 
should range between 0.3 and 1 nm. The conformation thus obtained are displayed in Fig. 



10 They are less homogeneous than those obtained by means of the Go model, the average 
mutual RMSD being 0.82 ± 0.19 nm in the case of ACBP and 0.76 ± 0.14 nm in the case of 
CI2. Their RMSD to the native state is 0.68 ± 0.19 nm in the case of ACBP and 0.70 ± 0.17 
nm in the case of C12. 

In order to validate the TS without carrying out a commitment analysis which is ex- 
tremely time-consuming in explicit solvent^, we have compared the TS conformations ob- 
tained from the folding trajectories to the TS conformations obtained by unfolding trajec- 
tories under the same conditions. According to the principle of detailed-balance, under the 
same conditions the two TS must be identical^. The TS conformations obtained in this case 
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are slightly more native-like, displaying a RMSD to the native conformation of 0.43 ± 0.13 
nm for ACBP and 0.62 ± 0.07 nm for CI2. In order to compare the set of TS conformations 
obtained from folding and from unfolding trajectories, we have calculated the average pair- 
wise RMSD of conformations across the two sets, which is 0.81 ± 0.10 nm for ACBP and 
0.76 ±0.13 nm for CI2. 

The average similarity between the folding and the unfolding TS is compatible, within 
the error bars, to the intrinsic heterogeneity of each set (their difference is O.OScx in the case 
of ACBP and in the case of CI2), and so guarantees that the two TS can be regarded as 
approximatively identical. 

VII. CONCLUSIONS 

The complexity of the characterisation of biomolecular processes is driving a continuos 
improvement of the experimental and the computational technique^. In particular, in the 
field of computer simulations, in the last few years we have assisted in a leap in the accessible 
time scale of plain MD simulations. Nonetheless even these major improvements are not 
able to address the complexity of folding problem for realistic protein^. This points to 
the necessity of carrying on with the development of both simplified model and advanced 
sampling methods. The present work further validates the use of the ratcheting algorithm 
in the study of protein folding and extends its use to the approximate identification, at an 
atomic level, of the transition state ensemble of a protein in explicit solvent. 
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FIG. 1. The specific heat of ACBP (whose structure is displayed in the inset) as a function of 
temperature for the model interacting through the modified Go model (solid curve) and through 
a standard Go model (dashed curve). The temperature is expressed in energy units. 
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FIG. 2. Comparison of the folding simulations of ACBP ratcheted along dcM with those generated 
by plain MD. (upper panel) The average folding time (circles) and the fraction of stuck trajectories 
which are not able to reach the native state (squares), as a function of the ratcheting constant k. 
The latter is displayed in a logarithmic scale, except in the case of the points marked as 0, which 
identify the simulation carried out without ratcheting, (lower panel) The root-mean-square error 
(RMSE) between the matrix Mij calculated at k and that calculated at A; = (diamonds), the 
average distance d between the matrices Mij calculated at k and those calculated at /c = (filled 
circles, the error bars indicate the standard deviation), and within the matrices Mij calculated at 
k (empty circles). 
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FIG. 3. Comparison of the folding simulations of ACBP ratcheted along the RMSD with those 
generated by plain MD. (upper panel) The average folding time (circles) and the fraction of stuck 
trajectories which are not able to reach the native state (squares), as a function of the ratchet- 
ing constant k. The latter is displayed in a logarithmic scale, except in the case of the points 
marked as 0, which identify the simulation carried out without ratcheting, (lower panel) The root- 
mean-square error (RMSE) between the matrix Mij calculated at k and that calculated at A: = 
(diamonds) , the average distance d between the matrices Mij calculated at k and those calculated 
at A; = (filled circles, the error bars indicate the standard deviation), and within the matrices 
Mij calculated at k (empty circles). Here, the values of k are given in energy units divided by nm. 
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FIG. 4. Comparison of the folding simulations of CI2 ratcheted along dcM with those generated 
by plain MD. (upper panel) The average folding time (circles) and the fraction of stuck trajectories 
which are not able to reach the native state (squares), as a function of the ratcheting constant k. 
The latter is displayed in a logarithmic scale, except in the case of the points marked as 0, which 
identify the simulation carried out without ratcheting, (lower panel) The root-mean-square error 
(RMSE) between the matrix Mij calculated at k and that calculated at A; = (diamonds), the 
average distance d between the matrices Mij calculated at k and those calculated at /c = (filled 
circles, the error bars indicate the standard deviation), and within the matrices Mij calculated at 
k (empty circles). 
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FIG. 5. The folding probability pfoid calculated over a set of conformations extracted from 10 
folding simulations. The conformations displaying pjoid = 0.5 build out, by definition, the folding 
transition state. 
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FIG. 6. Comparison between the Go-model transition-state conformations for ACBP obtained 
by plain-MD simulations through the commitment analysis (A) and those obtained by ratcheted 
simulations as explained in the text (B). The width and the color of the average conformations 
denote the RMS fluctuations. 
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FIG. 7. The typical behavior of the contact-map distance to the native conformation dcM, the 
ratchet displacement Ap and the RMSD to the native conformation in a folding trajectory ratcheted 
with k = 1. The light curve in the middle panel is a 0.8-ps running average of the underlying 
curve. The vertical dashed bar marks the TS identified in they trajectory. 
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FIG. 8. The histogram of Ap obtained from the parts of the trajectory of Fig. [7| which ranges 
between nominal times 30 and 100 ps, presumably corresponding to the climbing of the folding 
free-energy barrier (black solid curve) and between 110 and 125 ps, presumably corresponding to 
the descent to the native state (red solid curve). The fit obtained by Eq. [16] (dashed black curve) 



and 13 (dashed red curve) are also indicated. 
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FIG. 9. Structural properties of transition-state conformations obtained at various values of k and 
obtained from plain MD simulations {k = 0). (upper panel) The average distance between the 
contact map of TS conformations and that of the native state, (lower panel) The average RMSD 
between pairs of TS conformations at each value of k (black squares), the average RMSD between 
TS conformations obtained at different values of k and those obtained by plain MD simulations (red 
diamonds) and average RMSD to the native conformation (blue circles). The error bars indicate 
the standard deviation. 
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FIG. 10. The conformations corresponding to the transition state of ACBP (a) and CI2 (b) from 
the exphcit-solvent ratcheted simulations. The thickness of the surface indicates the standard 
deviation associated to the average structure. 
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